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The map of a city's streets constitutes a particular case of spatial complex network. However 
a city is not limited to its topology: it is above all a geometrical object whose particularity is to 
organize into short and long axes called streets. In this article we present and discuss two algorithms 
aiming at recovering the notion of street from a graph representation of a city. Then we show that 
the length of the so-called streets scales logarithmically. This phenomenon leads to assume that a 
city is shaped into a logic of extension and division of space. 



I. INTRODUCTION 

Traditionally the map or equivalently the street net- 
work of a city is represented by a graph with portion of 
streets considered as edges and their intersections as ver- 
tices. Since such a graph is large (7000 vertices for an 
average city) and displays non trivial patterns it came 
to the complex systems [1^ and complex networks [21 [5] 
field of study. In [4 the topology of this graph is studied 
by means of random walks, [6l [71 [TSl |20] study classical 
complex network parameters and [10] introduces spatial- 
ity to its work by means of shortest path distances and 
the notion of centrality. 

However this purely topological representation does not 
take into consideration the whole geometrical informa- 
tion of a city. In this article we define geometrical and 
straight graphs plus an integral allowing handling with 
a city as a geometrical object, the graph structure being 
only a skeleton that holds it up. The geometry of street 
segments is yet particular. They are coherently arranged 
into disjoint geometrical sets: the streets. We seek out 
from plain vector maps (i.e. vector collections of street 
segments) to recover the notion of street and thus to get 
a multi-scale representation of the city. At this point, one 
can really speak of street networks, we mathematically 
represent by straight hypergraphs. 

The street appears as a turn in the notion of axes and vis- 
ibility graph used in the Space Syntax framework [H \T3\ - 
[15j. The visibility map is not robustly defined with re- 
spect of small variations on a map. It is very sensitive 
to local curvature and to the sampling of the map [21]. 
Various method have been proposed to overcome this in- 
consistency. The notion of axes is replaced in [17] by the 
notion of named- street: two axes are the same if they 
have the same name in the data basis. In pTj two axes 
are melt if their angle is less or equal than a threshold 
(45°). But the resulting set of streets depends on the 
starting point of the algorithm. The Intersection Con- 
tinuity Principle is presented in [20 : two axes are melt 
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at an intersection if they make the largest convex angle 
between all angles at the intersection. 
The originality of our approach is to define streets for- 
mally in a framework devoted to cities, propose two algo- 
rithms computationally optimized and check their agree- 
ment with reality. 

In a first part we introduce a formal framework to rep- 
resent cities as both topological and geometrical objects. 
Then we present two algorithms depending on a single 
parameter to partition street segments into streets. From 
a data basis of 109 (not truncated) French towns we tune 
this parameter and asses the performances of each algo- 
rithm. To end with we study the resulting distribution 
of street lengths. We statistically prove from our data 
basis that street lengths in a city follow a mixture of 
log-normal laws and interpret this as the result of an ex- 
tension / division of space process. 

II. FORMAL REPRESENTATION OF CITY 
MAPS 

We represent a city by the notion of geometrical and 
straight graph. The vocabulary in use is freely adapted 
from general graph and geometric graph theory [12] . The 
notion of straight graph directly corresponds to the one 
of planar straight line graph. The main difference is the 
point of view we adopt and the topological and differen- 
tial structures we provide on the set of geometrical graph, 
see [HI [9] for details. 

A. Geometrical graphs 

A graph G = (V, £^) is a finite number of vertices V 
and a part E of V x V. If ]I[V is large one would prefer 
to use the word network. If V are points in an Euclidian 
space we speak of spatial networks [2 and if elements of 
E are materialized by geometrical curves that intersect 
only at their extremities that are elements of V we will 
say here that we have a geometrical graph. Hence a geo- 
metrical graph is both a topological object ( from (F, E)) 
and a geometrical one (elements of E are curves). When 
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elements of E are segments, we will say G is a straight 
graph. V = {vi^ ...,'^n) ^ (R^)"' and it is totally definded 
by its adjacency matrix A = (a^j). 

B. Hypergraph additional structure 

A hypergraph is a graph whose edges can contain more 
than two nodes. If G = (V^E) is a graph and R an 
equivalence relationship on E then the set of equivalence 
classes E/R constitutes hyper-edges: (V^E/R) is a hy- 
pergraph. In a urban context we can think of R = "these 
edges have the same street name" . We present below two 
relationships that define the city hypergraph structure H 
directly from the spatial information of G without addi- 
tional data. We write G = {{V,E),H) 



The result is noisy with detached structures (about 5 
percent) we erase by only keeping the largest connected 
component of the graph. We also erase edges appearing 
several times. For some algorithms its is more efficient to 
change the representation of the graph. For instance we 
can change E to an adjacency matrix or and adjacency 
lists (a list for each vertex of the edges passing through 
it and another list of adjacent vertices). 
The attribute table focuses on street segments with ad- 
ditional information such as length and name (the same 
name is attributed to street segments that compose the 
same "named-street"). We will see this table is more 
indicative than trustable. 



III. TWO ALGORITHMS TO RECOVER 
STREETS 



C. City graphs 

A city graph is a straight graph representation the 
street network of a city. This kind of graph has particular 
features studied for instance in [6]. A city graph writes 
C = {{V,E),H) where {V,E) is a straight graph and H 
an additional hypergraph structure. Elements of E are 
called street segments, they have no physical meaning: 
they are a sampling of the network. Elements of H are 
called streets. 

The degree is a function defined on V that associates to 
each vertex the number of edges that pass trough it. We 
write V = Vi U 1^2 U y+ with Vi vertices of degree 1 called 
dead-ends, V2 vertices of degree 2 called junctions (and 
seen as sampling artifacts) and vertices of degree ^ 3 in- 
tersections, d can be extended to each point on an edge: 
Ve G E^Mx G Int(e), (ic'(^) = 2. iV^^E) is a particular 
skeleton of G, any point in the interior of an edge can be 
added as an element of V2 without changing the overall 
structure. If e G V{e) is the set of extremities of e in 
V . li V ^ V, E{y) is conversely the set of edges passing 
through V and if G V(e), v{e) is the other extremity of 
e. An element h G H can be seen as a subgraph of C and 
induces a degree function dh. 



D. Data 

Maps are imported from a data basis of French regions 
vector maps "ESRI". A set of 109 cities is extracted. 
For each of them we get a geometry file " .MIF" and an 
attribute table ".mdb". 

The geometry of the street system is coded by a list 
of poly-lines. We underscan ".MIF" by taking care of 
preserving the angles at the intersections. We create a 
structure V containing the position of each vertex and a 
structure E containing for each edge two references to V 
for its extremities. H is an array with as many element 
as there are in E. Each element is a "label" (an integer) 
coding for the hyperedge to which belongs the edge. 



Let G = {V^E) be a city graph. To compute a H 
structure we will use the following property: If is a 
refiexive relationship on E"^ then the relationship R on E 
defined by: 

ei R 62 iif 3^1 = ei, a2, ... , o^n — ^2 ^ E\ol\Rol2'> a2 ^ <^3, Q^ri 

(1) 

is an equivalence relationship (transitive closure). 
Notice that defying a Hypergraph via an equivalence re- 
lationship provides an algorithm not depending on its 
starting point. 



A. Angular tolerance (AT) 

We use the refiexive relationship Rq depending on the 
angular parameter Q\ 

ei Re 62 iif 3v,vi,V2 eV.ei = [vvi],e2 = [^^2], 

{d{v) = 2)V{\U{^u^2)-7r\<0)) (2) 

this relation considers that two adjacent street segments 
are part of the same street if they meet at a junction or 
if they meet at an intersection but remain almost aligned 
(Fig. [ijleft). This algorithm strongly risks producing 
"branched streets" (red solid line in Fig. [ijleft. Fig. |2|. 

B. The minimal reciprocal alignment (MR A) 

To define we position at particular vertex v and 
consider the set of the edges passing through it E{v) = 
{ei = [viv]^ ..,en = ['^n'^]}- We iteratively define Sg with 
the variable s: (1) the initial "remaining edges" is set 
for 5 = 0: Eq = E{v)^ (2) we consider all pairs of edges 
(ci^ej) 1 < i < j < n, CiSeej iif 

\{/^{vVj^ vv^) — 7r| < 0) 
and Ve/e = [vvk] e Es ^ ei.ej , \{^{vvk,vvi)-7r\ < \{^{vvi,vvj)- 



and \{/^{vvk,vVj) — 7r| < \{Z{vVi,vVj) 



(3) 
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Streets 




FIG. 1. LEFT: Street 1 is branched (vertex D) and the Street 2 contains a loop. RIGHT: At intersection A, segment 1 could 
be associated to 3 and 2. The closest angle to tt is made by 3 but 3 and 4 correspond to an angle reciprocally minimal. They 
are associated and 1 goes with 2. The same reasoning leads to the same associations whatever the first segment considered. 



Two edges are associated if they are the raost aligned in 
Es. (3) Es-\-i is Eg without the edges associated in the s 
step. (4) We go on till (Eg) stabilizes. 

The refiexivity on the minimal condition induces the 
refiexivity of Sq. 

For instance in Fig. [l| right: £^0 = {1,2,3,4}, 3 and 4 
are associated, Ei = {1,2}, 1 and 2 are associated and 
the algorithm ends. 

C. Implementation 

Both algorithms can be implemented within the same 
skeleton by encapsulating two functions "Relation" with 
a boolean output, taking as parameters a vertex E{v) 
and two distinct elements of it. The algorithm divides 
in two steps: (1) determine local relations between seg- 
ments (2) transform this relation into equivalence classes 
by using Eq. [l] In the following code we mix up objects 
and their indice in an array. 

FUNCTION H = Hypergraph (Graph) 

V = Graph. Vertices (v by 2 array) 
E = Graph. Edges (e by 2 array) 
H= new Array (e by 1) 
Cor = new Array (e by 10) 
7o {STEP 1 } 

FOR i= 1 to V 

EExtract = find e in E such i in e (E(i)) 
FOR j < k 

el = EExtract (j) 
e2 = EExtract (k) 

IF RelationCi, el, e2, EExtract) 
Cor (el, next available) = e2 
Cor(e2, next available) = el 
END IF 
END FOR 
END FOR 
7o {STEP 2} 

CurrentMark = 1 
FOR i = 1 to e 
IF H(i) = 
stak = [i] 

WHILE notEmpty(stak) 

current = pop (stak) 



H( current) = CurrentMark 
push(stak, set Cor(e , not =0)) 

END 

CurrentMark ++ 
END IF 

END FOR 
END FUNCTION 

With plain graph structure, the complexity is 0{v x e) 
(Step 1) and 0(e) (Step 2) thus globaly in 0{v'^) (usualy 
e 1.5v). With an adjacency list (calculated in 0(e)) 
Step 1 becomes 0{v) and the whole algorithm is 0{v). 



IV. TUNING AND PERFORMANCES 

We have specified AT and MRA with a single angu- 
lar parameter a. In practice we want the algorithm to 
recover the actual streets of a city. It is hard to access 
to these information with our data: there are as many 
streets as there are different street names in the data 
basis But in a particular city, their number can be ex- 
tracted although not trustable. We just try to reach the 
true number of streets. Add to that (AT) and to a lesser 
extent (MRA) risk to produce branched rather straight 
streets. We define the branching coefficient to describe 
this tendency and seek out to minimize it. 
In this section we assess the performance of the algorithm 
and deduce an optimal tunning for a from a corpus of 
N = 109 major French towns: (Oi, .., Oat). 



A. Criteria 

1. Number of street recovering 

We assume we know for N cities their actual number 
of streets: Ti, ...,T/v. Let a — > fk{(^) the function that 
associates to an angle a the number of streets one of our 
algorithm asses for the city k. If the algorithm is relevant. 
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the quadratic error 



1 ^ 



k=l 



fkia)-Tk 



(4) 



is small. However (T/g) is not accurate. Some street seg- 
ments have a blank "NAME" field. The data basis under- 
estimates the number of streets. To get around this prob- 
lem, we assume the error in the data basis is proportional 
to the proposed number of streets: = {1 -\- X)Tk \/k. 
The criterion rewrites in function of a and A: 



1 ^ 

-T 

fe=i 



/fc(a) - (1 + \)n 
(1 + A)Tfe 



(5) 



A quick study of the data basis behavior permits to assess 
that 0.1 < A < 0.7. ^^(a, A) = leads to a functional 
relationship between a and A = Aq,: 



A(a) = 



and the criterion rewrites only in function of ol: 



(6) 
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FIG. 2. The red solid graph is a street. Its number of 
branches is 1 + 2 = 3. 



since it drives to an aberrant value of A. The branch- 
ing coefficient is in average S = 0.15 which is slightly 
high but stays reasonable (Fig. [s] bottom-right). Fig. [s] 
bottom-left shows the criterion for A constant equal to 
0.5. With the corrected number of streets the criterion 
is convex and 7r/5 appears as a rather good and stable 
minimum. 
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(7) 



2. Branching coefficient 

Let H a hypergraph structure computed from C and 
h e H di street, seen as an extracted subgraph of C. The 
number of branches in h is defined by: 



^(/i) = ^max(4W-2,0) 



(8) 
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To measure the branched aspect of H we define its 
branching coefficient from the number of branches of its 
streets: 



EiH) 



Yu>2{k-2).d{k) 



(9) 



If none of the streets is branched, S = and if H is 
componed of a single street, H is maximally branched 
wit S = 1. 



2. MRA 

The function T{a) is almost constant equal to 0.2 
(Fig. [4] top- left). Aq, is exponentially decreasing with 
an asymptotic value of 0.56 (Fig. |4] top-right). Added to 
that, VA G [0, 1] C(a, A) has an asymptotic minima (when 
a — > 7r/2, Fig. [I] bottom- left). The choice of A is hence 
not clear but for every reasonable value of A, the criteria 
is optimized for a 7r/2. Conversely A = 0.56 is stable 
since its is 0.56 o:^ A^ Va G [7r/5,7r/2] moreover this is 
the optimal value we found for AT which is comforting. 
'^MRA < 0.01 S^T-IO"^ (Fig. [4] bottom-right) which 
is very satisfactory. In fact a = it/2 means that the best 
tuning of the algorithm is " angle free" . Either the vertex 
under consideration is a junction or there is at least an 
angle smaller than 7r/2. Consequently the condition on 
a is relaxed from S'q to 6* = 6'>7^/2. 

The global minimum is the same for the two algo- 
rithms: 0.21 but the branching coefficient is much smaller 
for MRA. Branches in streets are anecdotal when using 
MRA. We will in practice use the MRA in its maximal 
version that does not depend on the angle. 



B. Analysis 

1. AT 

The function r(a) reaches its minimum (21%) around 
7r/5 (Fig. [3|top-left). This corresponds to a A = 0.5 (Fig. 
[3] top-right) which is coherent with the order of weight 
we expressed. The abslute minimum in is eliminated 



V. STREET LENGTH DISTRIBUTION 

A. Empirical street length fitting 

In a city, there are long streets assuring an efficient 
transportation system and small streets "fractal" dis- 
tributed to provide habitation space. We thus expect 
that the distribution of street lengths L exhibits a wide 
range of values or scales logarithmically. Fig. [5] plots 
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the distribution of the logarithm of street length in the 
French city Amiens. The global shape of this histogram 
suggests two maxima and two (different) normal tails. 
We assume that log L follows a mixture of two Gaussians 
(or similarly that L follows a mixture of log- normal laws): 

logL -p_.A/'(m_,cr_) + (1 -p_).A/'(m+,cr+) (10) 

with m_ < m+. The identification of this model has 
been performed with an Expectation Maximization al- 
gorithm. Other cases show that the multi log normal 
distribution is robust even if it is possible to observe one 
or two maxima. For our whole data basis of French towns 
we calculate a bi-normal fitting of L and calculated from 
a Kolomogorov - Smirnov test the p-value of this fitting: 
" L follows a mixture of two log normal laws''"' against " L 
does not follow a mixture of two log normal laws'''' . We 
have chosen this test rather than a Chi-2 for its robust- 
ness to distribution supports. 

In Protocol 1 we have for each city calculated the best 
parameters with an EM and calculated the P-value. It 
is often done this way in the literature. Nonetheless the 
statistics of the test is changed if parameters are esti- 
mated with the same data as for the test. In the normal 
case it remains the same asymptotically. We have not 
found a generalization to any distribution. 
We propose a second protocol: since Kolomogorov - 
Smirnov is relevant from 100 samples and our cities typ- 



ically contain 500 to 1000 streets, we randomly divide 
each length distribution in two parts, used one to esti- 
mate parameters and the other to perform the test. The 
estimation and the test are done with less data and are 
less accurate. Results for both methods are summed- up 
in Fig. [6] and Tab. [6) The hypothesis is as relevant as 
the p-value is close to 1. Traditionally one considers that 
the hypothesis cannot be rejected if p-value> 0.1. Let's 
focus on the second method. It is theoretically valid but 
needs randomization. From a realization to another the 
p-value of a particular city may highly change but the 
average p-value remains between 0.3 and 0.4. In 77% of 
cases the hypothesis is not rejected and in average the 
p-value is 0.32 which is quite high. 



B. Interpretation 

Log- normal laws are not rare in nature p!9l. They ap- 
pear in concentration of elements, latency periods of dis- 
ease, rainfall, permeability in plant physiology... They 
are characteristic of multiplicative processes. We then 
could think that a city shapes by dividing in smaller 
blocks former blocks. This would lead to consider the 
city is the result of a division process as in [3]. [22] re- 
calls that for isotropic planar tessellations stable under 
iteration the length of the typical "I segment" (a street) 
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FIG. 4. Tuning and performances of the Minimal Reciprocal Angle algorithm (MRA). Top-left: r(a), Top- Right: A(a), 
Bottom-Left: A(a, A) for A = 0.3 to 0.7 (its optimal value) and Bottom-right: the increasing of the mean branching coefficient 
with a asymptotically inferior to 1% 
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FIG. 5. The distribution of the logarithm of street lengths 
in Amiens (France). The red curve it the fitting of this dis- 
tribution by a mixture of two Gaussians. 



is long-tailed but the result is not a log-normal. It is 
necessary to add a phenomenon to get the log-normal 
distribution. Maybe the extension of the city: people 
have a typical transportation length: A. They accept to 
settle in a place where they have access to a constant 
volume of resources at a distance smaller than A. Then 



when they cannot divide blocks they place at the exterior 
of the city into larger blocks. 

To come to bimodality: this one does not appear on each 
city. A social science explanation is the following of sev- 
eral transportation mods along time or several popula- 
tions build the city with two different policies (inhabi- 
tants and industries for instance). 



VI. CONCLUSION 

We have presented a mathematical structure to con- 
sider a city not as a graph embedded in space but as 
a geometrical object. Similarly to Horton's method [16] 
to break down tree structures in hydraulic, we have pro- 
posed a linear in time algorithm to recover streets in a 
general geometric graph. This algorithm might have de- 
pend on a parameter but reveals to be parameter free. 
Our algorithm is "more reliable" than the data. We de- 
fine from city Hypergraph a new centrality: the simplest 
centrality [8 . Contrary to other centralities such as be- 
tweeness, closeness or straightness that one varies softly 
and is side-effects free. It allows emphasizing important 
axes in a map and conversely to detect ill deserved zones. 
The behavior of street lengths leads to think of the city 
as the result of a morphogenetic process based on the 
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duality extension / division of space [3]. 
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FIG. 6. Main characteristics of the p- value distributions for 
the test "the distribution follows a mixture of log- normal" 
in 109 French Cities. Protocol 1 estimates parameters and 
performs a Kolomogorov -Smirnov test with the same data. 
Protocol 2 use the (randomly chosen) half of the streets to 
assess parameters and the other half to perform the test. 
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